Skip to content

Conversation

sjpb
Copy link
Collaborator

@sjpb sjpb commented Mar 7, 2025

Avoids waiting for a timeout if ansible-init happens to have failed already before running site.

@sjpb sjpb marked this pull request as ready for review March 7, 2025 16:55
@sjpb sjpb requested a review from a team as a code owner March 7, 2025 16:55
- name: Check ansible-init hasn't failed (yet)
# NB: only allows early exit if it has, does not catch future failures!
assert:
that: "'failed' not in _ansible_init_failed.stdout"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[rocky@cclr-dev-svn3-dr08-u14 ~]$ sudo systemctl status ansible-init
● ansible-init.service
   Loaded: loaded (/etc/systemd/system/ansible-init.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Fri 2025-03-07 17:37:16 UTC; 1s ago
  Process: 24805 ExecStart=/usr/bin/ansible-init (code=exited, status=1/FAILURE)
 Main PID: 24805 (code=exited, status=1/FAILURE)
[rocky@cclr-dev-svn3-dr08-u14 ~]$ systemctl is-failed ansible-init
activating

Mine looks like that. Does it eventually give up?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What situation did this occur in? If its during dev with a broken compute-init I think I'm not too fussed and we should just close this PR - Bertie is going to add some notes to the docs on how to recover this situation. If it occurs in production its broken anyway, and ansible can't recover it really, IMO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this was a the issues adding new nodes into the cluster. I guess it should be resolved by Bertie's work on making it exit early if the hostvars do not exist.

For other issues though... It is a large timeout, so would be nice if it failed slightly quicker IMO.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe we should knock it down considerably, that's not this PR though. I'm going to close this and we can address during review of the compute-init update.

@sjpb
Copy link
Collaborator Author

sjpb commented Mar 14, 2025

Closed on the basis this isn't going to help as ansible-init keeps retrying. Timeout in compute-init should be knocked down though if possible on #627.

@sjpb sjpb closed this Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants